Jedi SAS Tricks: The DSTODS2 Procedure

26
Mastering the SAS DS2 Procedure second edition - the DS to DS2 Procedure
What new things will I learn in the second edition of
Mastering the SAS DS2 Procedure?

You have the infrastructure, and you know that DATA step would run so much better in DS2, but it’s complicated and you don’t know how to get started. Well, if you have SAS 9.4M5, rejoice! The DStoDS2 procedure is here, and it’s a humdinger! (No, Chris, not a Hemedinger… ;-) It's one of the many new things you can learn about in the new second edition of "Mastering the SAS DS2 Procedure - Advanced Data Wrangling Techniques"

Here are a few important caveats we should be aware of before diving in:

  1. PROC DStoDS2 can’t translate all possible DATA step syntax, but it does support a huge subset. Any lines of code lines in the traditional DATA step that cannot be translated will appear as comments in the output. Any comments in the original DATA step program are removed.
  2. PROC DStoDS2 translates only one DATA step at a time. The DATA step to be translated must be saved to its own text file which will serve as input.
  3. In the traditional DATA step, character variable lengths are measured in bytes, but in DS2 fixed-length character variable lengths are specified in characters. If your data has multi-byte encoding, adjust any LENGTH statements in the original DATA step before running PROC DSTODS2 to specify the variable length in characters, so that the DS2 DATA program will specify the correct lengths upon conversion.
  4. The resulting DS2 program might not be syntactically complete. You will have to add a RUN statement and any commented sections will have to be converted by hand. Perhaps a few other tweaks will be necessary before it executes and produce the desired results.

Having said all of that, PROC DStoDS2 makes converting traditional DATA step code to DS2 a snap! Let’s try it out on a sample program that as a lot of typical DATA step features:

data audi   (drop=make count)
     bmw    (drop=make count)
     counts (keep=make count);
   if _n_=1 then do;
      put 'And so it begins!';
   end;
   set cars;
   by Make;
   if first.make then Count=0;
   Count+1;
   select (Make);
      when ("Audi") output audi;
      when ("BMW")  output bmw;
      otherwise put Make= Model= ' - how did YOU get in here?';
   end;
   if last.make then output counts;
run;

We’ll save that to a file named DStoDS2_data_step.sas. Next, we’ll write and execute a PROC DStoDS2 step to convert the DATA step program to DS2 and save the new program as DStoDS2_data_step_converted.sas:

proc dstods2 in="&path/DStoDS2_data_step.sas" 
             out="&path/DStoDS2_data_step_converted.sas";
run;

A quick look at the output shows it’s not too pretty, but it seems functional enough. Because I’m working in SAS Studio, a quick click on the Format Code button makes it more legible. For the purposes of the blog, I’ll format it a little by hand:

data AUDI(DROP=(MAKE COUNT) ) 
     BMW(DROP=(MAKE COUNT) ) 
     COUNTS(KEEP=(MAKE COUNT));
   method run();
   if _N_=1.0 then
      do;
         put 'And so it begins!';
      end;
   set CARS;
   by MAKE;
 
   if FIRST.MAKE then
      COUNT=0.0;
   COUNT + 1.0;
 
   select (MAKE);
      when ('Audi') output AUDI;
      when ('BMW') output BMW;
      otherwise put MAKE=MODEL=' - how did YOU get in here?';
   end;
 
   if LAST.MAKE then
      output COUNTS;
   ;
_return:
   ;
end;
enddata;

That’s not too bad, actually. I have to add the PROC DS2 statement and the RUN and QUIT, obviously. I’ll remove that unused _return: label. And for efficiency’s sake, I’ll convert the IF _N_ =1 DO group into an INIT method:

proc ds2;
data AUDI_DS2(DROP=(MAKE COUNT)) 
     BMW_DS2(DROP=(MAKE COUNT)) 
     COUNTS_DS2(KEEP=(MAKE COUNT));
   method init();
         put 'And so it begins!';
   end;
   method run();
      set CARS;
      by MAKE;
      if FIRST.MAKE then COUNT=0.0;
      COUNT + 1.0;
      select (MAKE);
         when ('Audi') output AUDI_DS2;
         when ('BMW') output BMW_DS2;
         otherwise put MAKE=MODEL=' - how did YOU get in here?';
      end;
      if LAST.MAKE then output COUNTS_DS2;
   end;
enddata;
run;
quit;

Now, that seemed almost too easy, didn't it? Let's run my shiny new DS2 program, and check the log to make sure everything is running smoothly. Oops! It looks like my DS2 program produced a warning in the log:

WARNING: No DECLARE for assigned-to variable count; creating it as a global variable 
         of type double.

So back I’ll go to the editor, and add a declaration for the COUNT variable. This time it runs without a warning. Now, I’m a suspicious guy, so I’ll sort the outputs from both processes into the same order and run PROC COMPARE to ensure it’s all working as expected. Sure enough, I get that lovely “No unequal values were found. All values compared are exactly equal.” report.

So that’s the scoop – PROC DStoDS2 can ingest some pretty complex DATA step code and do 99% of the work needed to convert it to DS2. Remember Jedi Programming Rule #1? If not, here you go:
“Lazy programmers are great programmers.”

So lazy programmers of the world, rejoice - the DStoDS2 procedure has arrived! And, if you’re too lazy to copy the code in this post by hand, you can download the ZIP file from this link.

Until next time, may the SAS be with you!
Mark

Tags
Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. After 20 years as a US Navy submariner pursuing his passion for programming as a hobby, in 1994 he retired, turned his hobby into a dream job, and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS programming classes, and his book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques" is in its second edition. When he isn’t writing, teaching, or posting “Jedi SAS Tricks”, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach, and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in historic Williamsburg, VA with his wife, Lori, and Stella, their cat. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

Related Posts

26 Comments

  1. Hello Mark,
    the following code failed indicating INTERWRK.AEHF_DE" does not exist ( it is a SAS View table). Can DS2 use view table in the merge ? Thanks,
    /******/
    %let state=de;

    proc ds2;

    data INTERWRK.TMP_&STATE._ADDRESSES
    INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS(
    KEEP = ( PIK ADDRESS_YEAR ) );
    method run();
    merge INPUTS.ICF_US_ADDRESSES( IN = IN_A ) INTERWRK.AEHF_&STATE( IN = IN_B
    KEEP = ( PIK YEAR ) RENAME = ( YEAR = ADDRESS_YEAR ) );
    by PIK ADDRESS_YEAR;
    if IN_B;
    if IN_A then output INTERWRK.TMP_&STATE._ADDRESSES;
    else output INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS;
    ;
    _return: ;
    end;
    enddata;
    run;
    quit;
    /*******/
    ERROR: Compilation error.
    ERROR: BASE driver, Table AEHF_DE does not exist or cannot be accessed or created
    ERROR: Table "INTERWRK.AEHF_DE" does not exist or cannot be accessed
    ERROR: Line 12: Unable to prepare SELECT statement for table AEHF_de (rc=0x80fff802U).
    NOTE: PROC DS2 has set option NOEXEC and will continue to prepare statements.
    23 quit;

    • SAS Jedi

      Cindy,
      As noted in this SAS Communities posting, the problem is that the view was created with PROC SQL. DS2 is only compatible PROC FedSQL, so can't read PROC SQL views. But take heart - if the view is created with PROC FedSQL, the process should run fine. Here is a rudimentary mock-up using the code snip you provided:

      %let state=DE;
      libname INPUTS "%pathname(work)";
      libname INTERWRK "%pathname(work)";

      data INPUTS.ICF_US_ADDRESSES ViewSource (keep=PIK address_year rename=(address_year=YEAR));
      do PIK= 1 to 5;
      do ADDRESS_YEAR=2015 to 2020;
      Address='From ICF_US_ADDRESSES';
      output INPUTS.ICF_US_ADDRESSES;
      if not mod(ADDRESS_YEAR,3) and PIK in (1,4) then output ViewSource;
      end;
      end;
      PIK=5;
      ADDRESS_YEAR=2021;
      output ViewSource;
      run;

      proc FedSQL;
      drop view INTERWRK.AEHF_&STATE force;
      create view INTERWRK.AEHF_&STATE as
      select * from ViewSource
      ;
      quit;

      title "Source Tables";
      proc FedSQL;
      title2 "INPUTS.ICF_US_ADDRESSES";
      select * from INPUTS.ICF_US_ADDRESSES limit 10;
      title2 "INTERWRK.AEHF_&STATE";
      select * from INTERWRK.AEHF_&STATE limit 10;
      quit;

      /*Your code*/
      proc ds2;
      data INTERWRK.TMP_&STATE._ADDRESSES
      INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS(KEEP=(PIK ADDRESS_YEAR))/overwrite=yes;
      method run();
      merge INPUTS.ICF_US_ADDRESSES(IN=IN_A)
      INTERWRK.AEHF_&STATE(IN=IN_B KEEP=(PIK YEAR) RENAME=(YEAR=ADDRESS_YEAR));
      by PIK ADDRESS_YEAR;
      if IN_B;
      if IN_A then output INTERWRK.TMP_&STATE._ADDRESSES;
      else output INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS;
      ;
      _return: ;
      end;
      enddata;
      run;
      quit;

      title "Result Tables";
      proc FedSQL;
      title2 "INTERWRK.TMP_&STATE._ADDRESSES ";
      select * from INTERWRK.TMP_&STATE._ADDRESSES limit 10;
      title2 "INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS";
      select * from INTERWRK.ICF_&STATE._NEWWORKERS_NO_ADDRESS limit 10;
      quit;

      I hope this helps! May the SAS be with you :-)
      Mark

  2. SAS Jedi

    Fernando,
    The capacity to STORE a character value is completely dependent on the storage technology - it really has nothing to do with the capabilities of DS2. For example, we are all familiar with the 32,767 (32K) byte limit for character data stored in SAS data sets. If you were to store a 1M character value to a SAS data set, it would be truncated at 32K. Hive has a 65,535 (64K) character limit (for VARCHAR), so if the same value was stored in Hive, it would be truncated at 64K.

    As for the very long JSON API responses I discussed in the previous comment, I captured the JSON response to a local VARCHAR variable and used a DS2 JSON package instance to parse out the data I wanted. The extracted data was stored in an output table, but the the raw JSON was discarded.
    May the SAS be with you!
    Mark

  3. Hello SAS Jedi,

    Thank you for sharing proc dstods2.

    Is there any way we can convert below proc sql - oracle query to ds2 query

    proc sql;
    options nosgen;
    connect to oracle(user="&oruser." orapw="&orpw." path="@abc" buffsize=15000);
    options sgen;
    create table sample as
    select id,
    waiver,
    opt
    from connection to oracle
    (select to_number(ex_id) as id,
    waiver,
    put(option,z4.) as option
    FROM my_table
    order by ann_fee
    );
    disconnect from oracle;
    quit;

    • SAS Jedi

      Rikin,
      The FedSQL integration with DS2 makes using explicit pass-through SQL a snap. I'm pretty handy with Orcale SQL, and in your demo code I do not recall an Oracle PUT function, so I'm pretty sure the "put(option,z4.) as option" won't work. However, here is sample code showing how to implement this in DS2 using FedSQL explicit pass-through:

      LIBNAME mydb oracle user="&oruser." orapw="&orpw." path="@abc";
      proc ds2;
      data sample;
      method run();
      set{select to_number(ex_id) as id,waiver
      /* This does NOT look like ORACLE SQL to me - if it's not, this won't work
      Consider using the PUT function in the DS2 code to do the conversion */
      , put(option,z4.) as option
      FROM my_table
      order by ann_fee};
      end;
      enddata;
      run;
      quit;

      You can learn more about this technique in my blog post titled "Jedi SAS Tricks: Explicit SQL Pass-through in DS2"
      May the SAS be with you!
      Mark

  4. What I've discovered through trial & error is that a SPDE libname that has any spaces in the paths doesn't work. For example, the first libname below works fine, but the second one doesn't work.

    libname slib spde 'c:\test\'
    datapath=('c:\test\a' 'c:\test\b\')
    indexpath=('c:\test\b\');

    libname slib spde 'c:\test\'
    datapath=('c:\test\a a\' 'c:\test\b b\')
    indexpath=('c:\test\b b\');

    • SAS Jedi

      While this is a great place to discuss general or specific coding issues, it's not a great place to get help for a issues which may depend on your specific SAS installation. SAS Technical Support is the best route to a quick solution for problems like this. I'm going to open a tech support track on this myself.

      As a reminder, for urgent, business-critical issues call +1 (919) 677-8008. Otherwise, I'd suggest using the SAS Tech Support Web Form (http://support.sas.com/ctx/supportform/createForm).

      Before contacting SAS Tech Support, gather this information and have it handy:
      Your company name and the country in which you are operating
      Your name, e-mail address and phone number
      Site number, operating system and software release (see top of a fresh SAS session’s log)
      SAS product involved (for this problem, the product in Base SAS)
      Succinct one-line description of the issue (subject line)
      Detailed problem description, including ERROR OR WARNING messages
      A description of any troubleshooting or research you’ve already done
      Copies of your SAS program, SAS log & any other pertinent files.
      After submitting the web form, you'll receive an automated email with a tracking number. Reply to that email and attach copies of your SAS program and the SAS log for the session which shows the errors / problem.

      I hope this helps!
      Mark

    • SAS Jedi

      Well, it looks like we identified a bug! The response to my tech support track indicated that a fix is planned for a future release which, at the present time, is scheduled for release by the end of 2018. However, as of this writing no fix is available. And because path names cannot be modified for existing SPDE libraries (see http://support.sas.com/kb/14/028.html), no work-around is available either. To avoid this problem going forward, please ensure the paths and file names for components of SPDE libraries do not contain spaces.

  5. Your example works, but when I try it with one of our SPDE libnames, it doesn't work.

    269
    270 libname YR4_12_1 spde 'h:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    271 datapath=('m:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    272 'p:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    273 'l:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    274 'k:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012')
    275 indexpath=('j:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    276 'i:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012')
    277 bysort=no
    278 compress=binary;
    NOTE: Libref YR4_12_1 was successfully assigned as follows:
    Engine: SPDE
    Physical Name: h:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012\
    279
    280
    281 proc ds2 libs=(YR4_12_1 work);
    ERROR: Client unable to establish connection.
    ERROR: The DATAPATH= option specifies physical paths, listed above, that do not exist or are
    inaccessible
    ERROR: PROC DS2 initialization failed.
    282 thread comp /overwrite=yes;
    283 dcl double total;
    284 method run();
    285 set YR4_12_1.BENEFICIARY_SUMMARY_FILE;;
    286 end;
    287 endthread;
    288 run;
    289 quit;

    NOTE: The SAS System stopped processing this step because of errors.
    NOTE: PROCEDURE DS2 used (Total process time):

  6. Thanks for the reply. The test code works fine. However, our goal was to try and use the DS2 procedure to speed up data access to some of our large datasets (100+ million obs) that were created using the SPDE libname engine. I guess it wouldn't work on these, either, since the libref is a series of concatenated paths. Do you know if this problem be addressed in a future release?

    • SAS Jedi

      SPDE libraries are not concatenated libraries. They are distributed libraries using the SPDE engine, which is a completely different beast. SPDE is explicitly listed as a supported DS2 data source and should be no problem to access using DS2. For example, this code runs flawlessly on my SAS 9.4M5 installation (Windows):
      libname slib spde 'c:\sasdata\spde\mainpath'
      datapath=('c:\sasdata\spde\data1' 'd:\sasdata\spde\data2')
      indexpath=('c:\sasdata\spde\index');

      proc ds2 libs=(slib work);
      thread comp /overwrite=yes;
      dcl double total;
      method run();
      set slib.narrow;
      by group;
      if first.group then total=0;
      total+normal;
      if last.group then output;
      end;
      endthread;
      run;

      data summary/overwrite=yes ;
      dcl thread comp c();
      method run();
      set from c threads=4;
      end;
      enddata;
      run;
      quit;

  7. I tried the following code, but got an error. The SASHELP.CARS dataset works fine in a normal dataset, so I'm not sure what the problem is. Any ideas?

    proc ds2;
    data TEST;
    method run();
    set SASHELP.CARS;
    end;
    enddata;
    run;
    quit;

    177 proc ds2;
    ERROR: Client unable to establish connection.
    ERROR: The DATAPATH= option specifies physical paths, listed above, that do not exist or are
    inaccessible
    ERROR: PROC DS2 initialization failed.
    178 data TEST;
    179 method run();
    180 set SASHELP.CARS;
    181 end;
    182 enddata;
    183 run;
    184 quit;

    • I also tried the code from your example and got an error.

      17 proc ds2;
      18 data AUDI_DS2(DROP=(MAKE COUNT))
      19 BMW_DS2(DROP=(MAKE COUNT))
      20 COUNTS_DS2(KEEP=(MAKE COUNT))/overwrite=yes;
      21 dcl double Count;
      22 method init();
      23 put 'And so it begins!';
      24 end;
      25 method run();
      26 set CARS;
      27 by MAKE;
      28 if FIRST.MAKE then COUNT=0.0;
      29 COUNT + 1.0;
      30 select (MAKE);
      31 when ('Audi') output AUDI_DS2;
      32 when ('BMW') output BMW_DS2;
      33 otherwise put MAKE=MODEL=' - how did YOU get in here?';
      34 end;
      35 if LAST.MAKE then output COUNTS_DS2;
      36 end;
      37 enddata;
      38 run;
      ERROR: Compilation error.
      ERROR: BASE driver, Table CARS does not exist or cannot be accessed or created
      ERROR: Table "WORK.CARS" does not exist or cannot be accessed
      ERROR: Line 26: Unable to prepare SELECT statement for table CARS (rc=0x80fff802U).
      39 quit;

      NOTE: The SAS System stopped processing this step because of errors.
      NOTE: PROCEDURE DS2 used (Total process time):
      real time 0.20 seconds
      cpu time 0.17 seconds

    • SAS Jedi

      Robert,
      DS2 does not support concatenated SAS libraries (see Usage Note 51043: Librefs with concatenated libraries are not supported in DS2 - https://support.sas.com/kb/51/043.html)

      Try this:
      libname ds2help "!SASROOT\core\sashelp";
      proc ds2;
      data TEST;
      method run();
      set DS2HELP.CARS;
      end;
      enddata;
      run;
      quit;

      May the SAS be with you!
      Mark

  8. I've been having trouble getting this procedure to work in SAS University Edition. I tried copying your example and I always get this result:

    1 proc dstods2 in='/folders/myfolders/DStoDS2_data_step.sas'
    2 out='/folders/myfolders/DStoDS2_data_step_converted.sas';
    3 run;

    ERROR: Could not generate formats text file.
    ERROR: Could not compile source code.
    NOTE: The SAS System stopped processing this step because of errors.

    The syntax of the procedure is simple enough that it's hard for me to see how I might have gone wrong, maybe it's a problem with the configuration of University Edition somehow?

    • SAS Jedi

      Matthew,
      Thanks for taking the time to point this out. This post was created using PC SAS. I have replicated this error in SAS University Edition and have submitted a tech support track to see what's up. Will respond here if I get more information.
      Regards,
      Mark

        • The DStoDS2 procedure requires that the in and out files be in the current working directory for SAS. You can't include path information in the in= and out= procedure options. However, it's not hard to change the working directory (in SAS 9.4M4 or higher) using the DLGCDIR function. For example, let's say your program file is C:\temp\datastep.sas and you want the DS2 data program file produced to be named dataprogram.sas. This code would do the trick:

          %put %qsysfunc(dlgcdir(c:\temp));
          PROC DSTODS2 IN="datastep.sas"
          OUT="dataprogram.sas"
          ;
          RUN;

Back to Top